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Abstract 

A curious observation was made that the rank statistics of scientific citation 
numbers follows Zipf-Mandelbrot's law. The same pow-like behavior is exhibited by 
some simple random citation models. The observed regularity indicates not so much 
the peculiar character of the underlying (complex) process, but more likely, than it 
is usually assumed, its more stochastic nature. 



1 Introduction 



Let us begin with an explanation as to what is Zipf's law. If we assign 
ranks to all words of some natural language according to their frequencies 
in some long text (for example the Bible), then the resulting frequency-rank 
distribution follows a very simple empirical law 

/W = £ (i) 

with a « 0.1 and 7 « 1. This was observed by G. K Zipf for many 
languages long time ago |T|, @] . More modern studies @ also confirm a very 
good accuracy of this rather strange regularity. 

In his attempt to derive the Zipf's law from the information theory, 
Mandelbrot ||, |5j produced a slightly generalized version of it: 

fin = 



P3 



(P2 + r) 

Pi,P2,Ps all being constants. 

The same inverse pow-law statistical distributions were found in embar- 
rassingly different situations (For reviews see |6|, [7j]). In economics, it was 
discovered by Pareto Q8j] long ago before Zipf and states that incomes of 
individuals or firms are inversely proportional to their rank. In less for- 
mal words || , "most success seem to migrate to those people or companies 
who already are very popular" . In demography |2|, |TD|, [TTf , city sizes (pop- 
ulations) also are pow-like functions of cities ranks. The same regularity 
reveals itself in the distributions of areas covered by satellite cities and 
villages around huge urban centers |T2j . 

Remarkably enough, as is claimed in [[I3fl , in countries such as former 
USSR and China, where natural demographic process were significantly 
distorted, city sizes do not follow Zipf's law! 

Other examples of zipfian behavior is encountered in chaotic dynamical 
systems with multiple attractors [[T3j| , in biology |jI5jl , ecology [|T5fl , social 
sciences and etc. [|I7| 



Even the distribution of fundamental physical constants, according to 
1811 > follows the inverse power law! 



The most recent examples of Zipf-like distributions are related to the 
World Wide Web surfing process [T9], |20j . 

You say that all this sounds like a joke and looks improbable? So did 
I when became aware of this weird law from M. Gell-Mann's book "The 



2 



Quark and the Jaguar" Q2I] some days ago. But here are the distribution 
of first 50 USA largest cities according to their rank [[22j| , fitted by Eq.2: 




The actual values of fitted parameters depend on the details of the fit. I 
assume (rather arbitrarily) 5% errors in data. 

Maybe it is worthwhile to remember here, the old story about a young 
priest who complains his father about having a very difficult theme for his 
first public sermon - virgin birth. 

- "Look father", he says, "if some young girl from this town, becomes 
pregnant, comes to you and says that this is because of Holy Spirit. Do 
you believe it?" 

The father stays silent for a while, then answers: 

-"Yes, son, I do. If the baby would be born, if he would be raised and 
if he would live like the Christ" . 
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So, clearly, you need more empirical evidence to accept improbable 
things. Here is one more, the list of the most populated countries RZ51 
fitted by the Mandelbrot formula (j2j): 




Even more simple Zipfian a/r parameterization will work in this case fairly 
well! 

2 Fun with citations 

But all this was known long ago. Of course it is exciting to check its 
correctness personally. But more exciting is to find whether this rule still 
holds in a new area. SPIRES database provides excellent possibility to 
check scientific citations against Zipf-Malderbrot's regularity. 
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As I have been involved in this matters because of M. Gell-Mann's book, 
my first try naturally was his citations itself. The results were encouraging: 




5 10 15 20 25 30 



Ran k 



But maybe M. Gell-Mann is not the best choice for this goal. SPIRES 
is a rather novel phenomenon, and M. Gell-Mann's many important papers 
were written long before its creation. So they are purely represented in 
the database. Therefore, let us try present day citation favorite E. Witten. 
Here are his 160 most cited papers according to SPIRES (Note once 
more that the values of fitted parameters may depend significantly on the 
details of the fit. In this and previous case I choose \fN as an estimate for 
data errors, not to ascribe too much importance to data points with small 
numbers of citations. In other occasions I assume 5% errors. Needless to 
say, both choices are arbitrary): 
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You have probably noticed very big values of the prefactor p\. Of course 
this is related to the rather big values of other two parameters. We can 
understand big value of p2 parameter as follows. The data set of individual 
physicist's papers are subset of more full data about all physicists. So we 
can think of P2 as being an average number of papers from other scientists 
between two given papers of the physicists under consideration. Whether 
right or not, this explanation gains some empirical support if we consider 
top cited papers in SPIRES [£5j (Review of particle physics is excluded): 
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As we see p2 is fairly small now. 

At last, it is possible to find the list of 1120 most cited physicists (not 
only from the High Energy Physics) on the World Wide Web [EZ5fl . Again 



the Mandelbrot formula (0) with p\ = 3.81 • 10 4 , p2 = 10.7 and p% = 0.395 
gives an excellent fit. Now there are too many points, making it difficult 
to note visually the differences between the curve and data. In the figure 
that follows, we show this relative difference explicitly. 
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For the most bulk of data the Mandelbrot's curve gives the precision 
better than 5%! 

You wonder why now p2 is relatively high? I really do not know. Maybe 
the list is still incomplete for his lower rank part. In any case, if you take 
just the first 100 entries from this list, the fit results in p\ = 2.1 • 10 4 , P2 = 
—0.09, £>3 = 0.271. This example also shows that actually the Mandelbrot's 
curve with constant pi, p^ Ps is not as good approximation as one might 
judge from the above given histograms, because different parts of data 
prefer different values of the Mandelbrot's parameters. 

3 Any explanation? 

The general character of the Zipf-Mandelbrot's law is hypnotizing. We 
already mentioned several wildly different areas where it was encountered. 
Can it be considered as some universal law for complex systems? And if so, 
what is the underlying principle which unifies all of these seemingly different 
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systems? What kind of principle can be common for natural languages, 
individual wealth distribution in some society, urban development, scientific 
citations, and female first name frequencies distribution? The latter is 
reproduced below [|27j| : 
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Another question is whether the Mandelbrot's parameters p2 and p% 
can tell us something about the (complex) process which triggered the 
corresponding Zipf-Mandelbrot distribution. For this goal an important 
issue is how to perform the fit (least square, \ 2 i method of moments |2U[ 



or 



something else?). I do not have any answer to this question now. However 
let us compare the parameters for the female first name distribution from 
the above given histogram and for the male first name distribution (data 
are taken from the same source |27|1 ). In both cases x 2 fit was applied with 
errors assumed for each point. 
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The power-counting parameter p% is the same for both distributions, 
although the P2 parameter has different values. 

If you are fascinated by a possibility that very different complex systems 
can be described by a single simple law, you maybe will be disappointed (as 
was I) to learn that some simple stochastic processes can lead to very same 
Zipfian behavior. Say, what profit will you have from knowing that some 
text exhibits Zipf 's regularity, if this gives you no idea the text was written 
by Shakespeare or by monkey? Alas, it was shown [|, |28|, [23], |3Uf that 
random texts ("monkey languages") exhibit Zipf's-law-like word frequency 
distribution. So Zipf's law seems to be at least |5j] "linguistically very 
shallow" and [^j "is not a deep law in natural language as one might first 
have thought" . 

Two different approaches to the explanation of Zipf's law is very well 
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summarized in G. Millers introduction to the 1965 edition of Zipf's book [[|: 
"Faced with this massive statistical regularity, you have two alternatives. 
Either you can assume that it reflects some universal property of human 
mind, or you can assume that it reflects some necessary consequence of 
the laws of probabilities. Zipf chose the synthetic hypothesis and searched 
for a principle of least effort that would explain the apparent equilibrium 
between uniformity and diversity in our use of words. Most others who 
were subsequently attracted to the problems chose the analytic hypothesis 
and searched for a probabilistic explanation. Now, thirty years later, it 
seems clear that the others were right. Zipf's curves are merely one way 
to express a necessary consequence of regarding a message source as a 
stochastic process" . 

Were "others" indeed right? Even in the realm of linguistics the debate 
is still not over after another thirty years have passed |5T{]. In the case of 



random texts, the origin of the Zipf's law is well understood [[52], |33j]. In 
fact such texts exhibit no Zipfian distribution at all, but log-normal dis- 
tribution, the latter giving in some cases a very good approximation to 
the Zipf's law. So there is no doubt that simple stochastic (Bernoulli or 
Markov) processes can lead to a Zipfian behavior. No dynamically non- 
trivial properties (interactions and interdependence) is required at all from 
the underlying system. But it was also stressed in the literature j33], |T3j 
that this fact does not preclude more complex and realistic systems to ex- 
hibit Zipfian behavior because of underlying nontrivial dynamics. In this 
case, we can hope that the Zipf-Mandelbrot parameters will be meaningful 
and can tell something about the system properties. Let us note that the 
rank-frequency distribution for complex systems is not always Zipfian. For 
example, if we consider the frequency of occurrence of letters, instead of 
words, in a long text, the empirical universal behavior, valid over 100 natu- 
ral languages with alphabet sizes ranged between 14 and 60, is logarithmic 

M 

f(r) = A - Blnr 

where A and B are constants. This fact, of course, is interesting by itself. 
It is argued in [[35j that both regularities (zipfian and logarithmic) can have 
the common stochastic origin. 

An interesting example of Zipf-Mandelbrot's parameters being useful 
and effective, is provided by ecology |56|, [37| . The exponent j?3 is related to 
the evenness of the ecological community. It has higher values for "simple" 
and lower values for "complex" systems. The parameter p2 is related to 
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the "diversity of the environment" |37j and serves as a measure of the 
complexity of initial preconditions. 

The another pole in explanation of Zipf 's law seeks some universal prin- 
ciple behind it, such as "least effort" j2j, "minimum cost" ||, "minimum 



energy" [j38j or "equilibrium" [p9fl . The most impressive and, as the above 



ecological example shows, fruitful explanation is given by B. Mandelbrot 
[]5|, ^Oj an d is based on fractals and self-similarity. 

As we see, the suggested explanations are almost as numerous as the 
observed manifestations of this universal pow-like behavior. This probably 
indicates that some important ingredient in this regularity still escapes to 
be grasped. As M. Gell-Mann concludes [|2T| "Zipf 's law remains essentially 
unexplained" . 



4 The almighty chance 

If monkeys can write texts they can make citations too! So let us imagine 
the following random citation model. 

• At the beginning there is one "seminal" paper. 

• Every sequential paper makes at most ten citations (or cites all pre- 
ceding papers if their number does not exceed ten). 

• All preceding papers have an equal probability to be cited. 

• Multiple citations are excluded. So if some paper is selected by chance 
as an citation candidate more than once, the selection is ignored (in 
this case total number of citations in a new paper will be less than 
ten). 

I doubt about monkeys but it is simple to learn computer to simulate such 
a process. Here is the result of simulation for 1000 papers. 
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So we see an apparent pow-like structure, although with staircase be- 
havior. We expect this stepwise structure to disappear if we eliminate the 
democracy between papers and make some papers more probable to be 
cited. 

Note that even the value of exponent is reasonably close to what 
was really observed for the most cited papers. But this can be merely an 
accident and I do not like to make some farfetched conclusion about the 
nature of citation process from this fact. 

In reality "Success seems to attract success" ||. Therefore, let us try to 
see what happens if the equal probability axiom is changed by perhaps a 
more realistic one: 

• The probability for a paper to be cited is proportional to n + 1, where 
n is the present total citation number for the paper. 
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It is still assumed that all preceding papers compete to be cited by a new 
paper, but with probabilities as follows from the above given law. The 
result for 1000 papers now looks like 




Rank 



The fit seems not so good now, nevertheless you can notice some resem- 
blance with the case of individual scientists. Again I refrain from premature 
conclusions. Although it is not entirely surprising that the well-known a 
given paper of a certain author is, the more probable becomes its citation 
in a new paper. 

5 Discussion 

So scientific citations (leaving aside first name frequencies) provides one 
more example of Zipf-Mandelbrot's regularity. I do not know whether this 
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fact indicates only to significant stochastic nature of the process or to some- 
thing else. In any case SPIRES, and the World Wide Web in general, gives 
us an excellent opportunity to study the characteristics of the complex 
process of scientific citations. 

I do not know either whether Mandelbrot's parameters are meaningful 
in this case, and if they can tell us something non-trivial about the citation 
process. 

The very generality of the Zipf-Mandelbrot's regularity can make it 
rather "shallow". But remember, that the originality of answers on the 
question of whether there is something serious behind the Zipf-Mandelbrot's 
law depends how restrictive frameworks we assume for the answer. Shallow 
framework will probably guarantee shallow answers. But if we do not re- 
strict our imagination from the beginning, answers can turn out to be quite 
non-trivial. For example, fractals and self-similarity are certainly great and 
not shallow ideas. This point is very well illustrated by the "Barometer 
Story", which I like so much that I'm tempted to reproduce it here (it is 
reproduced as given in M. Gell-Mann's book [jZTfl). 



6 The Barometer Story — by Dr. A. Calandra 

Some time ago, I received a call from a colleague who asked if I would be 
the referee on the grading if an examination question. It seemed that he 
was about to give a student a zero for his answer to a physics question, 
while the student claimed he should receive a perfect score and would do so 
if the system were not set up against the student. The instructor and the 
student agreed to submit this to an impartial arbiter, and I was selected... 

I went to my colleague's office and read the examination question, which 
was, "Show how it is possible to determine the height of a tall building with 
the aid of a barometer." 

The student's answer was, "Take the barometer to the top of the build- 
ing, attach a long rope to it, lower the barometer to the street, and then 
bring it up, measuring the length of the rope. The length of the rope is the 
height of the building." 

Now this is a very interesting answer, but should the student get credit 
for it? I pointed out that the student really had a strong case for full credit, 
since he had answered the question completely and correctly. On the other 
hand, if full credit were given, it could well contribute to a high grade for 
the student in his physics course. A high grade is supposed to certify that 
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the student knows some physics, but the answer to the question did not 
confirm this. With this in mind, I suggested that the student have another 
try at answering the question. I was not surprised that my colleague agreed 
to this, but I was surprised that the student did. 

Acting in the terms of the agreement, I gave the student six minutes 
to answer the question, with the warning that the answer should show 
some knowledge of physics. At the end of five minutes, he had not written 
anything. I asked if he wished to give up, since I had another class to take 
care of, but he said no, he was not giving up, he had many answers to 
this problem, he was just thinking of the best one. I excused myself for 
interrupting him to please go on. In the next minute, he dashed off his 
answer, which was: "Take the barometer to the top of the building, and 
lean over the edge of the roof. Drop the barometer, timing its fall with a 
stopwatch. Then, using the formula s = at 2 /2, calculate the height of the 
building." 

At this point, I asked my colleague if he would give up. He conceded 
and I gave the student almost full credit. In leaving my colleague's office, I 
recalled that the student had said that he had other answers to the problem, 
so I asked him what they were. 

"Oh, yes," said the student. "There are many ways of getting the height 
of a tall building with the aid of a barometer. For example, you could take 
the barometer out on a sunny day and measure the height of the barometer, 
the length of its shadow, and the length of the shadow of the building, and 
by the use of simple proportion, determine the height of the building." 

"Fine," I said. "And the others?" 

"Yes", said the student. "There is a very basic measurement that you 
will like. In this method, you take the barometer and begin to walk up the 
stairs. As you climb the stairs, you mark off the length and this will give 
you the height of the building in barometer units. A very direct method." 

"Of course, if you want a more sophisticated method, you can tie the 
barometer to the end of a string, swing it as a pendulum, and determine 
the value of g at the street level and at the top of the building. From the 
difference between the two values of g, the height of the building can, in 
principle, be calculated." 

Finally, he concluded, "If you don't limit me to physics solution to this 
problem, there are many other answers, such as taking the barometer to 
the basement and knocking on the superintendent's door. When the super- 
intendent answers, you speak to him as follows: 
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Dear Mr. Superintendent, here I have a very fine barometer. If you will 
tell me the height of this building, I will give you this barometer ..." 
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Note added 



After this paper was completed and submitted to e-Print Archive, I have 
learned that the Zipf's distribution in scientific citations was discovered 
in fact earlier by S. Redner He also cites some previous studies on 

citations, which were unknown to me. 

I also became aware of G. Parisi's interesting contribution ||2j] from Dr. 
S. Juhos. 

I thank S. Redner and S. Juhos for their correspondence. 
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